User-friendly biplots in R
Centre for Multi-Dimensional Data Visualisation (MuViSU)
muvisu@sun.ac.za
SASA 2024
The biplot is a powerful and very useful data visualisation tool.
Biplots make information in a table of data become transparent, revealing the main structures in the data in a methodical way, for example patterns of correlations between variables or similarities between the observations.
A biplot is a generalisation of a two-dimensional scatter diagram of data that exists in a higher dimensional space, where information on both samples and variables can be displayed graphically.
There are different types of biplots that are based on various multivariate data analysus techniques.
Main Function
biplot()
Type of Biplot
PCA()
CVA()
PCO()
CA()
Aesthetics
samples()
axes() newsamples()
newaxes()
Operations
prediction()
interpolate()
translate()
density()
fit.measures()
classify() alpha.bags()
ellipses()
rotate()
reflect()
zoom()
regress()
splines()
Plotting
plot()
# Object of class biplot, based on 150 samples and 5 variables.
# 4 numeric variables.
# 1 categorical variable.
| Argument | Description |
|---|---|
data |
a dataframe or matrix containing all variables the user wants to analyse. |
classes |
a vector identifying class membership. Required for CVA biplots |
group.aes |
Variable from the data to be used as a grouping variable. |
center |
a logical value indicating whether data should be column centered, with default TRUE. |
scaled |
a logical value indicating whether data should be standardised to unit column variances, with default FALSE. |
Title |
Title of the biplot to be rendered. |
Data: \({\bf{X}}\)
# X1 X2 X3
# 1 5.418840 5.054240 8.711160
# 2 3.129920 1.783160 3.385920
# 3 6.128080 2.173200 8.173560
# 4 6.781120 4.753280 8.731640
# 5 7.346560 5.893200 11.303040
# 6 7.208200 3.744000 10.075760
# 7 7.039440 5.213640 8.608840
# 8 5.465720 4.492640 5.596520
# 9 7.723240 4.708120 11.357480
# 10 7.109560 4.987520 7.732840
# 11 8.135800 4.392000 9.264840
# 12 6.287480 5.908720 7.488240
# 13 4.648880 7.198280 8.573720
# 14 5.798600 6.120080 8.254840
# 15 8.084560 3.234840 8.966560
# 16 6.157773 5.743455 9.899045
# 17 3.556727 2.026318 3.847636
# 18 6.963727 2.469545 9.288136
# 19 7.705818 5.401455 9.922318
# 20 8.348364 6.696818 12.844364
# 21 8.191136 4.254545 11.449727
# 22 7.999364 5.924591 9.782773
# 23 6.211045 5.105273 6.359682
# 24 8.776409 5.350136 12.906227
# 25 8.079045 5.667636 8.787318
Geometrically the rows of \({\bf{X}}\) are given as coordinates of \(n\) samples in the \(p\)-dimensional space \(\mathbb{R}^p\).
The aim is to seek an \(r\)-dimensional plane that contains the points whose coordinates are given by the rows of \({\bf{\hat{X}}}_{[r]}\) which minimises a least squares criterion given by, \[\begin{equation} || {\bf{X}} - {\bf{\hat{X}}}_{[r]}||^2 = tr\{({\bf{X}} - {\bf{\hat{X}}}_{[r]})({\bf{X}} - {\bf{\hat{X}}}_{[r]})'\}. \end{equation}\]
The best approximation that minimises the least squares criterion is the \(r\)-dimensional Eckart-Young approximation given by \({\bf{\hat{X}}}_{[r]} = {\bf{U}} {\bf{D}}_{[r]} {\bf{V}}'\)
A standard result when \(r=2\) from is that the row vectors of \({\bf{\hat{X}}}_{[2]}\) are the orthogonal projections of the corresponding row vectors of \({\bf{X}}\) onto the column space of \({\bf{V}}_2\). The projections are therefore,
\[\begin{equation} {\bf{X}} {\bf{V}}_2. \end{equation}\] These projections are also known as the first two principal components.
The columns of \({\bf{X}}\) are approximated by the first two rows of \({\bf{V}}\), which now represent the axes for each variable.
We have constructed a biplot, but the variables represented by the vectors (arrows) have no calibration.
That meaning, there are no markers on the vectors representing the variables analogous to ordinary scatterplots.
To construct a biplot axis with relevant markers for a variable, a \((p-1)\)-dimensional hyperplane \(\mathscr{N}\) perpendicular to the Cartesian axis is required.
From the data, \(p = 3\) therefore, a two-dimensional hyperplane is constructed perpendicular to \(X_1\) through a specific value of \(X_1\), say \(\mu\).
The intersection of \(\mathscr{L}\) and \(\mathscr{N}\) is an \((r-1)\)-dimensional intersection space, which in this case will be indicated by a line. All the points on this intersection line in \(\mathscr{L}\) will predict the value for \(\mu\) for the \(X_1\)-axis.
The plane \(\mathscr{N}\) is shifted orthogonally through another value on \(X_1\) and all the points on the intersection line of \(\mathscr{L}\) and \(\mathscr{N}\) will predict that value that the plane goes through.
As the plane \(\mathscr{N}\) is shifted along the \(X_1\)-axis, a series of parallel intersection spaces is obtained.
Any line passing through the origin will pass through these intersection spaces and can be used as an axis fitted with markers according to the value associated with the particular intersection space.
To facilitate orthogonal projection onto the axis, similar to an ordinary scatterplot, the line orthogonal to these intersection spaces is chosen.
PCA function| Argument | Description |
|---|---|
bp |
Object of class biplot. |
dim.biplot |
Dimension of the biplot. Only values 1, 2 and 3 are accepted, with default 2. |
e.vects |
Which eigenvectors (principal components) to extract, with default 1:dim.biplot. |
group.aes |
If not specified in biplot() |
show.class.means |
TRUE or FALSE: Indicating whether group means should be plotted in the biplot, with default FALSE. |
correlation.biplot |
TRUE or FALSE: Indicating whether distances or correlations between the variables are optimally approximated, with defautl FALSE. |
# # A tibble: 150 × 5
# Sepal.Length Sepal.Width Petal.Length
# <dbl> <dbl> <dbl>
# 1 5.1 3.5 1.4
# 2 4.9 3 1.4
# 3 4.7 3.2 1.3
# 4 4.6 3.1 1.5
# 5 5 3.6 1.4
# 6 5.4 3.9 1.7
# 7 4.6 3.4 1.4
# 8 5 3.4 1.5
# 9 4.4 2.9 1.4
# 10 4.9 3.1 1.5
# # ℹ 140 more rows
# # ℹ 2 more variables: Petal.Width <dbl>,
# # Species <fct>
Main Function
biplot()
Type of Biplot
PCA()
CVA()
PCO()
CA()
Aesthetics
samples()
axes() newsamples()
newaxes()
Operations
prediction()
interpolate()
translate()
density()
fit.measures()
classify() alpha.bags()
ellipses()
rotate()
reflect()
zoom()
regress()
splines()
Plotting
plot()
samples()Change the colour, plotting character and character expansion of the samples.
samples()Change the colour, plotting character and character expansion of the samples.
samples()Select certain groups, and add labels to the samples
samples()| Argument | Description |
|---|---|
label.col |
Colour of labels |
label.cex |
Text expansion of the labels |
label.side |
Side at which the label of the plotted point appears - “bottom” (default), “top”, “left”, “right” |
label.offset |
Offset of the label from the plotted point |
connected |
TRUE or FALSE: whether samples are connected, with default FALSE |
connect.col |
Colour of the connecting line |
connect.lty |
Line type of the connecting line |
connect.lwd |
Line width of the connecting line |
axes()Change the colour and line width of the axes
axes()Change the colour and line width of the axes
axes()Show the first two axes with vector representation and unit circle
axes()| Axis labels |
|---|
ax.names |
label.dir |
label.col |
label.cex |
label.line |
label.offset |
| Ticks |
|---|
ticks |
tick.size |
tick.label |
tick.label.side |
tick.label.col |
| Prediction |
|---|
predict.col |
predict.lwd |
predict.lty |
| Orthogonal |
|---|
orthogx |
orthogy |
Main Function
biplot()
Type of Biplot
PCA()
CVA()
PCO()
CA()
Aesthetics
samples()
axes() newsamples()
newaxes()
Operations
prediction()
interpolate()
translate()
density()
fit.measures()
classify() alpha.bags()
ellipses()
rotate()
reflect()
zoom()
regress()
splines()
Plotting
plot()
prediction()prediction()prediction()Predict only on the variable Sepal.Length: use the which argument.
# Object of class biplot, based on 150 samples and 4 variables.
# 4 numeric variables.
#
# Sample predictions
# Sepal.Length Sepal.Width Petal.Length Petal.Width
# 1 5.083039 3.517414 1.403214 0.2135317
# 2 4.746262 3.157500 1.463562 0.2402459
# 51 6.757521 3.449014 4.739884 1.6079559
# 52 6.389336 3.210952 4.501645 1.5094058
# 101 6.751606 2.836199 5.928106 2.1069758
# 102 5.977297 2.517932 5.070066 1.7497923
Automatically or manually translate the axes away from the center of the plot.
On the first group
On the first group
On the second group, and adding contours
On the third group, and changing the colour of the contours.
# Object of class biplot, based on 150 samples and 4 variables.
# 4 numeric variables.
#
# Quality of fit in 2 dimension(s) = 97.8%
# Adequacy of variables in 2 dimension(s):
# Sepal.Length Sepal.Width Petal.Length Petal.Width
# 0.5617091 0.5402798 0.7639426 0.1340685
# Axis predictivity in 2 dimension(s):
# Sepal.Length Sepal.Width Petal.Length Petal.Width
# 0.9579017 0.8400028 0.9980931 0.9365937
# Sample predictivity in 2 dimension(s):
# 1 2 3 4 5 6 7 8
# 0.9998927 0.9927400 0.9999141 0.9991226 0.9984312 0.9949770 0.9914313 0.9996346
# 9 10 11 12 13 14 15 16
# 0.9998677 0.9941340 0.9991205 0.9949153 0.9945491 0.9996034 0.9942676 0.9897890
# 17 18 19 20 21 22 23 24
# 0.9937752 0.9990534 0.9972926 0.9928624 0.9896250 0.9932656 0.9918132 0.9955885
# 25 26 27 28 29 30 31 32
# 0.9812917 0.9897303 0.9979903 0.9990514 0.9963870 0.9975607 0.9985741 0.9876345
# 33 34 35 36 37 38 39 40
# 0.9833383 0.9957412 0.9970200 0.9935405 0.9859750 0.9953399 0.9994047 0.9990244
# 41 42 43 44 45 46 47 48
# 0.9980903 0.9756895 0.9953372 0.9830035 0.9763861 0.9959863 0.9905695 0.9987006
# 49 50 51 52 53 54 55 56
# 0.9996383 0.9987482 0.9275369 0.9996655 0.9544488 0.9460515 0.9172857 0.9061058
# 57 58 59 60 61 62 63 64
# 0.9727694 0.9996996 0.8677939 0.8686502 0.9613130 0.9328852 0.4345132 0.9679973
# 65 66 67 68 69 70 71 72
# 0.7995848 0.9083037 0.7968614 0.5835260 0.7900027 0.8575646 0.8524748 0.6615410
# 73 74 75 76 77 78 79 80
# 0.9367709 0.8661203 0.8350955 0.8929908 0.8702600 0.9873164 0.9969031 0.6815512
# 81 82 83 84 85 86 87 88
# 0.8937189 0.8409681 0.7829405 0.9848354 0.6901625 0.8073582 0.9666041 0.6665514
# 89 90 91 92 93 94 95 96
# 0.6993846 0.9909923 0.9008345 0.9710941 0.8037223 0.9913632 0.9744493 0.7089660
# 97 98 99 100 101 102 103 104
# 0.9071738 0.9064541 0.9625371 0.9872279 0.9171603 0.9636413 0.9976224 0.9829885
# 105 106 107 108 109 110 111 112
# 0.9854704 0.9888092 0.8464463 0.9729353 0.9771293 0.9794313 0.9746239 0.9977302
# 113 114 115 116 117 118 119 120
# 0.9941859 0.9605563 0.8476794 0.9289985 0.9929982 0.9916850 0.9818957 0.9493751
# 121 122 123 124 125 126 127 128
# 0.9865358 0.8716778 0.9728177 0.9846364 0.9840890 0.9861783 0.9854516 0.9691512
# 129 130 131 132 133 134 135 136
# 0.9942007 0.9585884 0.9705389 0.9937852 0.9874192 0.9723192 0.9230503 0.9794405
# 137 138 139 140 141 142 143 144
# 0.8947527 0.9797055 0.9458421 0.9902488 0.9674660 0.9350646 0.9636413 0.9867931
# 145 146 147 148 149 150
# 0.9500265 0.9470544 0.9688318 0.9886543 0.8735433 0.9281727